Data extraction and annotation based on domain-specific ontology evolution for deep web

نویسندگان

  • Kerui Chen
  • Wanli Zuo
  • Fengling He
  • Yongheng Chen
  • Ying Wang
چکیده

Deep web respond to a user query result records encoded in HTML files. Data extraction and data annotation, which are important for many applications, extracts and annotates the record from the HTML pages. We proposed an domain-specific ontology based data extraction and annotation technique; we first construct mini-ontology for specific domain according to information of query interface and query result pages; then, use constructed mini-ontology for identifying data areas and mapping data annotations in data extraction; in order to adapt to new sample set, mini-ontology will evolve dynamically based on data extraction and data annotation. Experimental results demonstrate that this method has higher precision and recall in data extraction and data annotation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation for Query Result Records based on Domain-Specific Ontology

The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats. Recently evolving customer friendly web applications need special data extraction mechanisms to draw out the required data from these deep web, according to the end user query and populate to the output page dynamically at the fastest rate. In...

متن کامل

Robust and Efficient Annotation based on Ontology Evolution for Deep Web Data

Among those researches in Deep Web, compared to research of data extraction which is more mature, the research of data annotation is still at its preliminary stage. Currently, although the approach of applying ontology in data annotating has been approved by most researchers, there are many weaknesses existed, such as the complexity of the ontology, as well as the limitation on static ontology’...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. Sci. Inf. Syst.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2011